Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy
نویسندگان
چکیده
This paper studies the issue of conceptually indexing the blogosphere through the whole hierarchy of Wikipedia entries. About 300,000 Wikipedia entries are used for representing a hierarchy of topics. Based on the results of judging whether each blog feed is relevant to a given Wikipedia entry, this paper proposes how to judge whether there exist blog feeds to be linked from the given entry. In our experimental evaluation, we achieved over 90% precision in this task.
منابع مشابه
A Wikipedia Based Semantic Graph Model for Topic Tracking in Blogsphere
There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth” effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph, in which...
متن کاملPredicting Central Topics in a Blog Corpus from a Networks Perspective
In today’s content-centric Internet, blogs are becoming increasingly popular and important from a data analysis perspective. According to Wikipedia, there were over 156 million public blogs on the Internet as of February 2011. Blogs are a reflection of our contemporary society. The contents of different blog posts are important from social, psychological, economical and political perspectives. ...
متن کاملTopic Classification of Blog Posts Using Distant Supervision
Classifying blog posts by topics is useful for applications such as search and marketing. However, topic classification is time consuming and error prone, especially in an open domain such as the blogosphere. The state-of-the-art relies on supervised methods, requiring considerable training effort, that use the whole corpus vocabulary as features, demanding considerable memory to process. We sh...
متن کاملConceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
Automatic indexing is one of the important technologies used for Textual Data Analysis applications. Standard document indexing techniques usually identify the most relevant keywords in the documents. This paper presents an alternative approach that aims at performing document indexing by associating concepts with the document to index instead of extracting keywords out of it. The concepts are ...
متن کاملTopic structure extraction for meeting indexing
This paper describes a system that automatically generates meeting minutes by extracting a topic hierarchy from a meeting’s speech. The topic hierarchy is a tree structure whose nodes comprise a topic summary. The topic structure extraction process converts speech recognition results into a word conceptual vector sequence and divides the sequence into the topic segments (topic segmentation). It...
متن کامل